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A METHOD OF DETERMINING THE CONSTANTS 
IN THE BIMODAL FOURTH DEGREE 
EXPONENTIAL FUNCTION 


By 
A. L. O’TooLe 


In a paper in this Journal’ the present writer has discussed 
some of the mathematical properties of a class of definite integrals 
which arise in the study of the frequency function 

, ¢ 
(1) yze @2 (2244.22, 254 py x7 + fog *Ra) ago. 
This function defines the system of frequency curves for which 
the method of moments is the best method of fitting’—i.e. best in 
the sense of maximum likelihood—and this fact gives importance 
to its study. The curves are typically bimodal, the nature and 
location of the modes being given by the roots of the equation 


(2) 4x eG p, x74 2p, % + Py =O. 


The first problem which arose was that of finding an expres- 
sion for the value of the definite integral 


eo 
(3) L, af onynom 224 gx + Pg) yy 


=O 


If x is replaced by x - 2 this integral becomes 


°9 
(4) a | 


Oo 
~00 


1On the system of curves for which the method of moments is the 
best method of fitting. Vol. IV, No. 1, Feb. 1933, p. 1. 

2R. A. Fisher, On the mathematical foundations of theoretical sta- 
tistics, Philosophical Transactions of the Royal Society of London, vol. 
222, series A (1921), p. 355. 
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or 
a0 
4 -at 
(5) Z af 2% Mpx*+grlay where kee? 


3 
=CO 


or, replacing x ¥@ by x where a is the positive square root of aé 


(6) Le= & f @(erapxteal ge) ay, 


VQ 
00 
or 
ao 
(7) Le af e+ Pte Ox) yy, 
00 
where Nom. Pzap, Qs atg. 


No real loss of generality is incurred in studying (5), (6) 
or (7) rather than (3). For the purposes of the previous paper 
it was found convenient to discuss certain special cases of (7) 
first, then (7) itself and later (5). Having in mind the practical 
purposes of this note, however, attention will be focused first on 
the form (5) and afterwards on (3). The transformations from 
the expressions obtained in the previous paper are very simple. 
For (5) the special cases studied and a few of the more impor- 
tant results obtained may be stated here as follows: 

Type I: 


P=g-2. 
” atx? A 1 
L-kf e de aga (3), 


2n_-arx4 A 2ne+l 
L,wk x e ae = —renvtye | <7), 7mz=QO12,3..., 


00 
aogiese 
Lonel <A gt" » er ee QO, 2mz=OL2,3,°:-:: 
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ae a 
- 1, a2/74) 4a2 ’ 


hence 
2 i 
= i me 
Pfenrvtl 
a a ee eanme ceere ? 
an < a”l(4) 


Z 
Langs Pte O, 2 O,1,2,3,...... 


Obviously, of course, A depends upon the total frequency and 


hence if the total frequency is 


fre onmuen « Soe 
° -at%x# 1) 
J e ax TC: 4) 
- oo 
This curve has a single mode located at #=Qand is symmetrical 
with respect to the ordinate at x=@. 
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Type II: 


g=9, p=-26, 530, 


L- fe - a(x * -26x7) ), 


pf oO 2088 a 





f = S:latb* 95-1a%b° 13:9-5-1a%6? 
“Ag J+; ry, —— i ae -) 


2,2 4.4 646 
Leor(gyin st, 2306 it fda. a 


a?b? C8 
—_— - ab 
le a'G gC. £*9 Ss " 


Alls 





ath4 949 
Z t= A 
a0 VGA Sg * 5-48 BI | 


It was shown that this integral could be expressed in terms of the 
Bessel functions At and A as follows: 


1 a36° 
ra 4 raeiF aul -1@ =1a%"), 21) 


° V@ 
where | 
24 GIT) 


V-4 


20U0GITU#) 
B= er ? a =\/-1. 
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If the total frequency is /V 
is enacalatieaaiiiiiitiniaiinai 
fe -a*x *2bx7) 
This curve is symmetrical with respect to the ordinate at x=O 


and has two real modes located at x= tyb. 


Type III: p20. 


z - hfe 279) 4g 


oc 
A ataige) 


(aig)* Sa4gh 9-5(a*g/” 
~~ Gq) ure re #26” #¥ tay * 


gy { Ore: 4. ia * 42. 10/ } ' 


N 
A= a0 2 4 
| la +Grly 
- 0 


This curve is not symmetrical and has only one real mode, that 
mode being located at x equal to the real cube root of negative FG: 
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Type IV: The general case. 


i= af eee enete 9 ety 
Oo 
00 


4 ~~ 24 apx*satgez)), 


a0 
‘ f e ut Px* + Qx)yy 
- 00 


It was shown that the value of this integral could be expressed 
as an infinite series each term of which involved two Bessel func- 
tions. But, as pointed out near the close of the previous paper, 
although this infinite series may be considered a theoretical solu- 
tion of the problem, it does not lead to a simple method of 
determining the constants 2g g.A which appear in the frequency 
function. It is the purpose of this note to give a practical method 
of determining these constants. 
Beginning with (5) 


ZL: “fe a“(a%s pxtege)iyy 


the 7-4moment u/ is defined by 


-” 2,4 2 
a (x + DX +gti, 
— - 00 


a npc iia aia 
- ‘ ee 
~ 00 





00 ati a 
f it Cae ox +Jr2 
- 00 


af 
“5 
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4, ny@é 
Integrate Z by parts, letting w= e tt — Vent ave0 Fe, 
Then 


(9) £-=- ; / (4x 4.ZoxJe"" taeige) 


Divide by Z and multiply by F and the result is 


(10) g=-(4u3+2pu,). 
Start again with Z in the form (5) and integrate by parts letting 
-Lx tox gx) 
use and @v = aw. Then 


oo 
4a 2 
(11) Z= ad (42% 2px 4.guJe — *$*/ yy 
~ 0 
Divide by Z and then 
f- a*(4u) + Cpu; +gu, ) 
or 


1 
(12) i iia 
4us +& Plt guy 
2 
Now integrate (11) by parts with u-ze -a*(2% pxgt)ing 


avs (4ut+ & px 2, Fx ) ax, This leads to 
l= 42996 2% 126, 4 BA gx ts 40 pc * 
0 50 LL p Fx %+ dO px 
13 -2Le% pat, 
9 + SOpgxs 15g%xDJe ene *F te, 


Divide by Z and obtain 
¢ 
125 (G6u, +126 plug + 64 gus + 40 p*uy +50 pg uy; +Ligi@u;) 
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or 


JO 
G6.u, + 12b pus +64 gu (, + 4Op uy rJOpgu; +15fu, 
Squaring in (12) the result is 


(14) @*s 


15 a*= 
ny 
-_ (444, +2puz+ gu, ) 


ff 
16u Jee 4p*ul*+ G <u; @, l6pu;uj Ogu, u/ + tpguju, 
Eliminating @ * between ( 14) and (15) the equation 


(16) p740u; -120u Deg (15 uz -30u/ epg(I0us -120u,'uz) 
+p lbug -480u3 uy) +g (Chul -24Ou,'u) )4( Huy - Mu, ide a) 


Using relation (10) and 
17 2 4@ iA 2a 
(17) F <= 16u,*+ L6puyu, +4 pu, 


eliminate g from (16) obtaining 


(18) TA 0*+ 2B + 20 =O 

and hence 

(19) -Bt VB" -L04C 
= SA 

where 


/ / 14,1  t4 4,4 
A=2uj-6u; ne ™, -Juye; , a 
= Ott 4, ,/ 4 o. Fo tio /<,/ 
(20)B = D0s)utu) - OO) 9s + 16. uf - 60 uu, el uty iy +f, 


1,4 1444 tt a / sé 
= I0uzu; - COU Uf - 4eugus + LOU /us uy + Lg -COuS, 


— 
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In order to decide upon one of the two values of furnished 
by (18) notice that, equating the first derivative of the frequency 
function to zero, the location of the two modes and the minimum 
point between them is determined by the roots of the equation 


(21) 424+ 2Zpx+g = 0. 
The condition for three real distinct roots in this equation is 
(22) -Ep">27g*, which requires p< O, 


where F is found from (17). If -6p% = 27g then one of the 
modes coincides with the minimum point. If P=g=O then both 
modes coincide with the minimum point. 

Extracting the square root in (17) gives two values of 
differing only in sign. Now it is easy to show either by geometri- 
cal considerations or by examining the algebraic manipulations 
leading to (18) that is independent of the sign of g - Chang- 
ing G to-¢g in (5) has the same effect as changing % to - x or, 
that is, reversing the order of the distribution and curve. Also, 
changing x to-x leaves the even moments unaltered‘ but changes 
the sign of every odd moment. Hence if the value of the function 
at the modal position on the left is greater than the value of the 
function at the modal position on the right then F is greater than 
zero. And if the value of the function at the modal position on 
the left is less than the value of the function at the modal position 
on the right then g is less than zero. If G=O the curve is sym- 
metrical with respect to the ordinate at x=. Hence p and F 
are determined by (19), (17) and (22), the sign of g being 
fixed by examination of the data of the problem or, if necessary, 
by trial. The value of @@ is then found by taking the positive 
square root in (15). Of course (14) would give the same value 
for @*. 

Now that 2%, p and are determined, there remains only 
A to be found. If the total frequency is /V then 


wo ee 2 
af _ pa wee “Fle N 
~-00 
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and hence 


N 


= 
J eee + px*4 gx), 


00 
where the numerical value of the integral in the denominator can 


be found by mechanical quadrature to any desired degree of ap- 
proximation. For purposes of the quadrature involved here it will 
be found that the simple rectangle quadrature formula will give 
as good results as could be desired.* Having found ¢ then the 
constant 7 is also known since 


(23) A= 


k= e -a*r 


? 


Ps ogeh P Lo lg A 
-@ aes 


The points of inflexion are located by equating the second 
derivative of the function to zero. The equation is 


(24) 


(25) a*(Ax5420%x +g)* 264%) O. 


If now x be replaced by x + 777 then 


(5) £ - fev tpxtrgerr) 


becomes 


@ 
3) Zafer eeinetastonss +0, ete 


“a 


8On the degree of Approximation of Certain Quadrature Formulas. 
Annals of Mathematical Statistics, vol. IV, No. 2, May 1933, p.143 by 
A. L. O'Toole. 
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p, = 4m 

P2> = Om “+2, 

PR» = 4m%+ 2mp + G: 
(26) 4 m* m4o emger. 
The data* in the first two columns of the table given here 
will provide the basis for an illustration of the method described 
above for determining the constants. The numbers in the first 
column are the classes into which the plants were divided. In the 
second column are found the frequencies corresponding to the 
various classes. In constructing the third column the origin for 
2x was arbitrarily placed to correspond to the class 25. Taking 


s_ 2x ”"#(c) 
2 tz) 


¢, 


the first six moments and the eighth moment are found to be 






= 0.1946903, 





“e 

uss on 3. 16572, 
j= EERE - 27, 09735, 
uj= 4002204 - 2212. 9785, 
j= 1EF8E0 ~ 410, 3540, 


ui 43 4 1296. 209228.5, 


! nme 72504_ 
lee =2I5071842 . 


‘This data, except for slight modifications, was extracted from that 
of W. L. Tower on the Seriation of Counts of Rays of Chrysanthemum 
Leucanthemum, Biometrika No. 1, 1901-2, p. 313. 
























031102 
300472 
1.583594 
4.991068 
10.246676 
14.831798 
16.280112 
14.482540 
11.089036 
7.712195 
5.108903 
3.359072 
2.269766 
1.621764 
1.252760 












077755 
.751180 
3.958985 
12.477670 
25.616690 
37.079495 
40.700280 
36.206350 
27.722590 
19.280487 
12.772257 
8.397680 
5.674415 
4.054410 
3.131900 















































2.133051 
3.108096 
4.654336 
6.917724 
9.793409 
12.593310 
13.938151 
12.502565 
8.504388 
4.078577 
1.274131 
.238029 
024259 


2.657275 
2.500000 
2.616325 
3.036112 
3.869837 
5.332627 
7.770240 
11.635840 
17.294310 
24.483522 
31.483275 
34.845377 
31.256412 
21.260970 
10.196442 
3.185327 
595072 
.060647 
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ComevTeO 
FREQUENCY 
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ice 
Ah aL 
al 
JAY 


11-18 «(15 23 25 
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Formulas (20) give 
A=-1409. 786 
B-=- 790428.9, 
C= -15954106. 
Hence from (19) 
p=-202. 7862 
or 
p=-#1. 48292. 
But p-=-21.46292 and the value of g to which it leads do 
not satisfy the relation (22) hence use ~9=-2O2.7862. Cal- 
culate F from (17) and use the positive square root since an 
examination of the data shows that the value of the function at 
the left modal value is greater than the value of the function at 
the right modal value. Hence 
F= LO. 4284. 
Formula (15) now gives as the positive square root 
a*= 20002649. 
Using these values for a7 7,g the values of the function 


lop 2x 4+ px*+ gx ) 
are calculated for integral values of x from x=-16 to x=16 
and tabulated in column four. The constant A is then found by 


dividing the total frequency 452 by the sum of column four. 
Hence 


ke 452 


“780, 792704 © © 900400. 


By (24) 
p=-I3472. 578. 


The function can now be written 


“ 2 
yu go 000BCA OK 202, 7862 x + 29,4204 x) taking 4= 25, 


* a é ; 
or yee (000264 (E202, 7862 2+ 29,4284 x 3472.578) 
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The values of the ordinates for this function are given in 
column five and to the nearest integer in column six. 
Equation (21) becomes 


42° 2(202.7GC62)x+29,4284=O 


which has the roots (approximate) x=-/0/1, x= AO7, x-10.03 
It should be noted that the sum of the three roots must equal 
zero. Hence the modes are located at x =-10./ and at ~= 003 
with the minimum point at x= 703. These roots can be deter- 
mined to any desired number of decimal places by Horner’s 
method. 

If now x is replaced by x=-<so that the new values of x 
are respectively equal to the numbers in the class column, the 
function becomes 


yee 000264YA4* 10024943547. 214 £?-S23BIN. 264+ 259605.39) 


The modes are now located at xw=/49 and x=J95.Q3 with the 
minimum point at x=<5.07. In the figure are shown the original 
distribution and the curve represented by this equation. 











ON THE TCHEBYCHEF INEQUALITY OF 
BERNSTEIN 


By 
Ceci, C. Craic! 


From Tchebychef’s inequality we know that if x,, x,,--, 
are a set of independent statistical variables with 


In 


, = 7. ee so i « = i... = go, 
and 
2 = 2 2 se « € 6 @ 2 
o”~ = “, * x, * + ~ 


then the probability P that 


-£I7 EX, +X, +----+x sto 
satisfies the inequality, 
4 
P2i1- z- 

This gives a lower limit for / which is often unsatisfactory. 
Improvement of this result requires further hypotheses. As is 
well-known, Pearson, Camp, Guldberg, Meidel, Narumi,? and 
Smith® have attacked this problem with considerable success. An- 
other interesting and important attempt in this direction due to 
S. Bernstein seems to have generally escaped attention in the Eng- 
lish-speaking world, at least, since it has been published only in 
Russian.* Because of the latter fact, it seems necessary to give 

1This paper was written in substantially its present form during the 
author’s tenure of a National Research Fellowship at Stanford University. 

2For references to all these papers except Smith’s and a brief discus- 
sion see Rietz, H. L., Mathematical Statistics, (Open Court Publishing 
Company, Chicago, 1927), pp. 140-144. 

8Smith, C. D., On Generalized Tchebychef Inequalities in Mathemat- 


ical Statistics, American Journal of Mathematics, Vol. 52, (1930), pp. 109- 
126 


‘Bernstein, S., Theory of Probability, (Moscow, 1927), pp. 159-165. 
The present account of this work of Bernstein is taken from a lecture of 
Professor J. V. Uspensky. 





ere 
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a brief account of this work of Bernstein’s preliminary to the 
remarks based on it the writer wishes to make. 
Bernstein imposed the condition in addition that 


< 
(1) E(\x,\'s Be win é; wee, fe £2 +---- ” 


(E /z)is read “the mathematical expectation of my in which / 
is arbitrary. (This condition is satisfied, eg., if the x; are 
bounded.) and used the following lemma due to Tchebychef. Let 
the statistical variable w be always >O. If E(&/=A, then the 
probability 0 that u >4Z satisfies the inequality, O< & 

Then taking, 


Us eX +X sence +8) 
- Ex, Ex, Ex, 


in which € is arbitrary, 


E'(u)= Ele ©*') E (e©*2).... Ee ©*") 
Now 
e*™ = L+Ex; * 


and under the condition (1), 


E*%a7 C%ez | 
a eae 








22 J 2 4,2 42 
Efeo*)< sole Le? ES Rk 


ame 
a 
If it is assumed that 
IElA<c< Tf 
then i £%0,7 
Ele <*t)< t+ Coy ce 2I-¢. , 
2(1-c) 
and thus 
2,2 
(2) Elu)< _ae) 


If in the inequality, « 2>4z¢* , a greater quantity is substituted 
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for A, then certainly Y< z, . Therefore the probability Q of 


<7? 


2 
wrertl-c) oF 


satisfies the inequality 


-7* 
Ose 
Now 
- Te E~a2 - 
ena Oe Fahy — 


implies for €>O, 
%, + My tes +B, 2 Ee *2(1-<) 
The value of € is next chosen so as to make Y a minimum, i.e., 
ze 


k et. ini Th 
so as to ma ef ‘H1-c) a minimum, us 


2 2bl-c)c? 


€ ~ 


Then the probability @ that 


£ 
- 2 2 
Ht Byte Ae 27a (— ) 
satisfies the inequality, 


- 
Gce” 


; ‘: 2 
if e*. 2h-elF sés5 with cx, 
To get the corresponding result for the lower limit of the 


sum x%,+x,+------+42, , it is only necessary to choose | 


€ < O and as before, the probability, Q , , that 


£ 
Bi+ Bg tir tx < -co()* 


” 
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satisfies the inequality, 


/ -2~ 
Yte 


a(1- 
if also €*= 2G) -@ 


and |El< & with c< 1. 


Combining these two results, if ? is the probability of 


-re(52)*s x, txge-44,8t0 (gS Z ; 
then since 
P+ C+ Q' =f, 
f2t-te™. 


But setting 


ff 
a 
ro (7) =a, 


and also, 
re a . 
the condition 
2(1-c) wie e* 
oe” "Ae 


, e* a 
(Bernstein set € “= pe > using merely the equality sign in this 


condition. The value of c as here given is necessary in the 
author’s developments below.) must be satisfied, or what is the 
same thing, 


2C1-c)*w*, c? 
2o4 he’ 


from which 


cn a... . 
Of +hAw 
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This last quantity on the right is positive and < / as required so 
that the constants can actually be chosen as specified. 
This gives 4 
aro) l20ore hey) 2, 


and finally the probability, P, that 


is such that 


P21-Ze 297 +2hw 


ur setting w=to 


ze 
(3) P:1-te **tgt 


It 1s to be observed that generally the quantity ot rapidly de- 
creases with increasing 7 . 

This is the inequality reached by Bernstein under the condi- 
tion (1). 

If all the x, 5 are bounded. if, say, always 


|| < b, iz LZ i ee y 


one may take A= £ , 


It is the purpose of the author's remarks to discuss less severe 
conditions than (1) under which the inequality (3) can be ob- 
tained. These more general conditions are obtained, however, at 
the expense of assuming quite generally satisfied regularity con- 
ditions with regard to the “tails” of the frequency distribution 
of x. which needs not necessarily to be regarded as the sum of 
” component variables, %,, %2,°°°° +, Xp - 

If we now take 


(+) aoa ** 
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we have 


& (a)2f dFtcJe<* ( FYjJis the probability function of x ). 





_f LPN 1 E01 ix. ---). 

~Qo 
The condition (1) insures that the series under the sign of inte- 
gration may be integrated over the interval (-, oo) . But the 
series can also be integrated over the same interval if it converges 
uniformly in any fixed finite interval, which it does, and if the 


series <2 gn (y/ , where 





yY E n, ” 
Ga? =f oa) © T- - 
a 
converges uniformly in the interval (— co, oo). 
Formally, at least, 


? 


2 
(5) Ela) = 1+ 14,5, + My Go ’ 


in which ,z, is the 4-¢/ moment about the mean of x . If 


6) et |< Sey 4? wee, 


for some 4>0O, then for 4 le | < ¢<dZ the right hand side 
‘ . ezo7 
of (5) is convergent and is < / + 277-c) as before. Now 


let us suppose that the condition (6) is satisfied not only for 
moments taken over the whole interval (-c7, w) but also for 
moments taken over any interval which includes the interval-4 4) 
in which 4 is an arbitrarily large though finite number. This is 
the first reyularity condition, mentioned above, which we shall 
impose on the tails of the frequency function of x . 
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Then it is obvious, from the remark above, that 


co 


Pe 
mo Fy) 


. ‘ se . 
is uniformly convergent in the interval for |y|l< 4 for 7 lelgcet 
And for |y|> & it is also obvious that for A|E|< ¢ < J, 


z, Fn yy) 


is uniformly convergent if our first regularity condition is satisfied. 
And since |E&| may be taken arbitrarily small, the inequality (3) 
follows as before. 

It is evident that if our first regularity condition holds, that 
the condition (6) is more general than the condition (1). And 
it is easily seen that this first regularity condition holds for a very 
wide class of frequency functions. For, in order for it to hold, 
it is sufficient that the frequency curve (continuous or not) out- 
side some finite interval ~2 4) about the mean as center, be never 
increasing as | | increases and that if £/x) be the ordinate of 
the frequency curve at the abscissa x. always 4/¢) > f(—-x) or 
else always 4(x)s F(x) for x >. 

But if the first regularity condition be satisfied, then for all 
intervals which include (-4,4)the corresponding moments have 
upper limits in absolute value. -And if this be so for all such 
intervals, the semi-invariants (of Thiele) will also have upper 
limits for their absolute values. If A, is the 4-¢#semi-invariant, 
we will take for our second reyularity condition on the tails of the 
frequency distribution of x . that 


A-2 


(7) la,ls #0 a,” k22 (A,=4,) 


al 


for some 4>0 if A, is taken for any interval which includes 
the arbitrarily large, though finite, interval (- 3). 

If this second regularity condition holds, it is again easy to 
show that (5) is an equality if AlEl* c< £. The right member 
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of (5) is still uniformly convergent in the interval (~2,2/ for 

3 |El< c <7. For all intervals which include (-4 4/we use 

the formal identity which defines the semi-invariants of Thiele: 
3 


é € 
(8) e A227 Hay —" 


€* ¢* HE) 
= 1+ 44,35, + Ah Fi #---=e 


Under the condition (7), B/E) is uniformly convergent over the 
intervals in question for 7|€|<c< / and for these values of € , 
(8) becomes an equality since its second member is only the first 
arranged in powers of €. Moreover, on account of (7) the right 
member must be uniformly convergent for all intervals which 
include (- 3, 4). 

At least one important class of frequency distributions satis- 
fies our second regularity condition. The distributions of charac- 
teristics in samples of (VV have finite ranges as long as /V is finite 
and they commonly have semi-invariants which are rapidly de- 
creasing with increasing VV. If such distributions approach nor- 
mality their semi-invariants of order above the second approach 
zero, in particular they may become in absolute value less than or 
equal to the corresponding semi-invariants of a Pearson’s Type 
III distribution which are given by 


At a (hk-L)/ aon 
A @** 
Z 
in which @e= mi 





, Or 
3 


A, = Ch-1)1a, C8) — 


: g oe ee 
Taking “= | @ | it is easy to see that such distributions satisfy 
our second regularity condition. The smaller the skewness of the 
Type III distribution, the smaller 4 may be taken. Thus in such 
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cases we can give a lower limit for P/|x\< 7a), the probability 
that lzl< to , which is improved with decreasing skewness of 
the Type III distribution. By the use of the first regularity con- 
dition we could only take 4= Z as the distribution approaches 
normality. 

As a second application, let us suppose that x = x,+x, in 
which x, and %, are independent, and in which the semi-invari- 
ants of the distribution of x, are 2 (= a*f c. Z---,and the 
semi-invariants of the distribution of x, are 4 Cg)t Gyr 
Then the distribution of x has for semi-invariants 


Ags 000), Aga yey, Ag Gt Gg 
Further let it be assumed that FZ <i, and that the distribu- 
tion of x, satisfies our second regularity condition. 
Then 
P( |x| ta) >P/|x,|s ¢a,) A |x,| < t(o-2,)) 


But 
P| « #00-4))= PU eels SPH ae ) 





-24(¢-0,)" 
ae 
*2+2h t(a-c,) 
: »I-Ze Ge Ze 
Now 
4 (t+22) % 
0-0, A47+g7)*-¢, <L+% 
G, —_ <i if o<x<f 
so that we get g* 
2+2he 
A lx,|<¢0o-6))>1-Ze ,, 
This gives finally in such cases 22 


“2 
P/\x\< ta)» P(\x,| <ta)-2Ze 


A. Dc. 











ON CORRELATION SURFACES OF SUMS WITH A 
CERTAIN NUMBER OF RANDOM ELEMENTS 
IN COMMON* 


By 
Cart H. FIscHErR 


Introduction. The study of correlation due to a common 
factor has been a more or less familiar one in the literature of 
mathematical statistics. Kapteyn,’ in an exposition of the Pear- 
sonian coefficient of correlation, considered the correlation be- 
tween two sums of normally distributed variables, the sums hav- 
ing A random elements in common. In 1920, Rietz? devised urn 
schemata which yield sums with common items involved in such 
a way that the correlation and regression properties can be dealt 
by a priori methods. In a later paper, Rietz* considered two vari- 
ables, each the sum of two random drawings of elements from a 
continuous rectangular distribution, with one of the elements in 
common. Here, the emphasis was placed principally upon the 
description of the correlation surface. Some other aspects and 
extensions of this problem were brought out by Karl Pearson‘ in 
an editorial discussion of Rietz’s paper. 

In the literature, the theory of correlation has been discussed 
principally in connection with its applications. One of the objects 
of some of the above-mentioned papers is the establishment of a 
closer connection between correlation theory and abstract prob- 
alibity theory. Such a connection would give a more precise 
 *Presented to the American Mathematical Society, Dec. 28, 1931. 

1J. C. Kapteyn, “Definition of the Correlation-Coefficient,” Monthly 
Notices of the Royal Astronomical Society, Vol. 72 (1912), pp. 518-525. 

2H. I. Rietz, “Urn Schemata as a Basis for the Development of Cor- 
relation Theory,” Annals of Mathematics, Vol. 21 (1920), pp. 306-322. 

8H. L. Rietz, “A Simple Non-Normal Correlation Surface,” Bio- 
metrika, Vol. 24 (1932), pp. 288-291. 


4Karl Pearson, “Professor Rietz’s Problem,” (Editorial), Biometrika, 
Vol. 24 (1932), pp. 290-291. 
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meaning to correlation and would tend to make the study of cor- 
relation theory more attractive to mathematicians. With this aim 
in view, the present paper is concerned with correlation among 
sums having common elements, extending and generalizing the 
preceding papers in several ways. 

We shall assume our drawings made from a continuous uni- 
verse characterized by a rather arbitrary law of distribution. We 
shall define 77 sums, each of an arbitrary number of elements, 
formed in such a manner that any two consecutive sums have 
elements in common, and inquire into the correlation between any 
two of these sums. The equations of the correlation surfaces 
will be expressed in terms of iterated integrals, the regression of 
each variable on the other will be shown to be linear, and the 
equations of the regression lines will be obtained. The coefficient 
of correlation may then be computed from the slopes of these lines. 

Throughout this paper we shall understand a_ probability 
function, £(4), to be, for all values of ¢ on a range £, a single- 
valued, real-valued, non-negative, continuous function of 7. It 
is then Riemann integrable on & , and we shall require that 


f *(t)dt=1. We define the probability that a value of ¢ , 
drawn at random from the range 7% , lie in the interval (2, d). 
b 
@ and } in 2 and Soa, to be I #(t/)dt . We may then say 
@ 


that £(t/e¢¢is, to within infinitesimals of higher order, the prob- 
ability that a value of ¢ drawn at random lies in the interval 
(t, ¢ +2). Bachelier® has classified probabilities into those of 
the first, second, and third kinds, and Craig* has extended this to 
probability functions, according as % is the range 00,0) ( og oo), 
and /O, a), respectively. We shall find it convenient to adopt 
this classification. 


5... Bachelier, “Calcul des Probabilities.” (1912), p. 155. 


®Allen T. Craig, “On the Distribution ~f Certain Statistics,” American 
Journal of Mathematics, Vol. 54 (1932), pp. 353-366. 
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I. Sums of elements drawn from a universe characterized 
by a probability function of the first kind. 

1. The correlation between two sums having random ele- 
ments in common. Let “(#), a probability function of the first 
kind, characterize the distribution of the variable ¢ . Let the 
principal variable x, be defined as the sum of 77, independent 
values of ¢ drawn at random. Further, let the principal variable 
%, be defined as the sum of hip random values of the 77, values 
of *# composing x, and of 7-4, independent random values of 
t taken directly from the universe characterized by F(t). 

Theorem I. Given the sums x, and x, as defined above, 
with 4, random elements in common. 

a) The marginal distributions of x, and x,are given, respec- 


tively, by 


GEA fit, Hg) Ag HO b ys) ge Ly 


and 


yay PPL A Gt MO gerd Aa ages ) 


@%-/ 
* f(<, “4 as Z Ww “ 2,-/72%, Mgt Pb 49M 


b) The correlation surface, ws F(X), %, ), or the simulta- 
neous law of distribution of x, and x,, is given by 
co @ 
2 L I EIA Mlirtz ty bags? Wn 
ene oi 


X ltt; iKyp 2g I ny C4 9 Mg Be 


c) The regression curves of x, on x, and of x, on x, are 
linear, and are given, respectively, by the following equations: 


(1.31) = Me™ 6 (ng hig) M, 


™ 
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and 
—" 
(1.32) X, = a 2+(74-42IM, 
a 
where M= f th t/t. 
~ 00 


Hence, the coefficient of correlation between x, and Xz is 
on ne... 
%,%2 (7, 7,)% 
Proof. The proof for the expressions for the marginal dis- 
tributions of x, and x, are given by Craig’ and need not be 
repeated here. The correlation surface w= F(%,, Xz) is derived 
by a simple extension of the same method to two independent 
variables. 
The regression curve of x, on x, is the locus of the ordinate 
of the centroid 2%, of a section of the surface for any given x,. = 


Thus ” 
[ I tig M1, 4) Le] 


[Z F(t, Xp IL%, ) 


It will be convenient in what follows to use an abbreviated nota- 


(1.4) Z, = 


tion by letting 


(1.5) Of%,,z 


re bye" 


: by Pt) Mh, nD by, nt 


which is merely the integrand of the marginal distribution of x, . 
Where no ambiguity can result, eo, will be used in place of 


Ol x,, t be tow ee ie as may be written 


7 


[EL Goto ten dE, ty, tag hyp g x apr” a 
x Lb (Ey os ME np Lh, 
Now let v= x,-¢, ----% be t, Kg? bn, y Changing the variable 


ai 7Allen Tf. Craig, loc. cit., pp. 355-356. 
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from x,to v, (1.4) becomes 
@ co Oe oo o op 
(1.6) %y = aoe 


tI 
LS: BN" [-)4 fr, ttt, yt de, , 7 ett, ar} 


oo oo M-/ 
a OD EE, ny oat, ef 


4é@ 


It will be noted that the terms in the numerator fall into two 
groups: those terms containing the factors 4,-,(¢=1,2--- 4, ), 
and those terms containing the factors v org, hgh ize, to Me -4). 
Further, since the order of integration nor is immaterial, the 
equality of the A, integrals of the first group follows readily. 
Similarly, the equality of the 7,-4,, integrals of the second 
group follows. The expression _ may then be written 


(17) ef h J Sts8 J fy )ildavat 


7-1 4 


7-1 
+(%- tel f- [v4 9, Fas t(t,,)tJavd, De yj at, 


/ j / of &, Tf, H(t, )t(e)advdt,, a, | 


-00 “-00 42 


In (1.7), it is clear that the integrations with respect to each Z2; 7, 
may be effected immediately, making use of Lo tli) dve«l 

In the first term of the numerator and in the denominator the 
variable v may likewise be integrated out. The denominator is 
now equal to (1.11), the marginal distribution function of x,. In 
the second term of the numerator, v- #(v) is independent of the 
remaining factors, and I v4(vJdv is a constant which we shall 
denote by M. This second term of the numerator is now equal 
to (7, - 4,,// times the marginal distribution function of x 
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Hence, we have now reduced the expression (1.7) for 2%, to the 
following form: 


_— a, =k aL, + (Me- Kg) M, 
fo St, 0l«,t - b, jdt + et, 
where J, = rn 
— Ox, t,t, jae, — 
L L ~ 1,7,-/ 479-1 “u 
To evaluate J, , let 4, -%,-u-t2-----t,-7. 
Then i ta 


[ff t,t, ty tyr by pAlb np Lg ae 


% rv 
i pe ep dahgr Jat ner) Mg He 


-00 - 4 77,-4 4 


@ oo 
Lif 46 tity by DE, gp Ly, he 


- -o oo 
o fe ; 
If OH, ty bye b ng I AE, yy Leg au 
-@ -@ 


a 
- a a, H%,, 4, feof at DEE ny Lat 


“9 oo @ 
aa ff Ox, u, Lee’ 4, tI Mba et, et 
= @ 


The first term in the above expression for /,, is equal to %, . 
Each of the remaining 7,-/ terms is equal to /, . Hence 


Ln, = x,-(n-1) ly , 


and 


4 


From (1.8) and (1.9), we have 
x 

%, = am + (Mg- hig JM. 
In exactly the same manner, we may show that 


Z.= hig %z 
4 47. 





+ (1,-%%_). 
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Making use of the fact that in the case of linear regression the 
square of the correlation coefficient is equal to the product of the 


slopes of the two lines of regression, we obtain 
fiz 

77,72, ) 

which completes the proof of the theorem. 

Corollary. If x and y are each the sum of 7 independent 
random values of a variable ¢ from a universe characterized by 
#(¢) , and have A of these values in common, the coefficient of 
correlation between x and y is equal to the ratio of the number 
of values of ¢ held in common to the total number composing 
each principal variable. Thus, Tuy = x . 

This corollary of Theorem I was proved by Kapteyn® for 
the special case of a normal parent distribution of the variable ¢ . 

Illustration. As a simple illustration of the application of 
the foregoing theorem, let us consider the case where 
m= t, +t, ~2=t,+ tz with ¢,,, #2.,%  ,as 
independent random drawings of 7 from the Gaussian distribu- 
tion, 


Vay ty 


odie 
t(t)=(2n)2 et 
From (1.11), the marginal distribution of , is 
« x2 
Gle)el4Ar7J te F 


Similarly, the marginal distribution of 2, is 
v2 


Gl~)<(4n) te °F 


The correlation surface. w= *(x,, x, ) , obtained by applying 
(1.2), is 


2 
Cf 4, + #5) 


F (2,,%,)2° . . : 
(217.3) ‘ 
a normal correlation surface with ~, wz: 
7 
2. The correlation among three sums. We now proceed to 
extend the preceding theorem to more than two sums. Let us 


define a third sum, or principal variable, x, , as the sum of 4,, 


8J. C. Kapteyn, Joc. cit. 
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elements taken at random from the 77, values of 7 composing x, 
plus the sum of 77,-4,, independent random values of Z drawn 
from the parent population. It is apparent, then, that the mar- 
ginal distributions of %,, %,,and x,, and the correlation sur- 
faces 7 (4, x) and 4 (x 1% ) will be formed exactly as were 
those of x, and x, in Theorem I. From this theorem, we are 
at once in a position to write the equations of the lines of regres- 
sion and the coefficients of correlation for these surfaces. The 
surface we /'(x,,x, ) remains to be investigated, as does the 
four-dimensional surface, v= VY A X,, Xp, x,/) , which may be ob- 
tained in almost the same manner. 

Theorem II. Given f(¢/and x, , Xp, x, , as defined above. 


Let G,- be defined as in (1.5). Let 
G(%, t),°° hag 3 le kag gtl? % 4 on bas” 


Key’ & = 9 
gpd Fy) Care Pg) 
xf(-t,,-- bk, 2H “as rer! 7 Sa -by ns) 


If oan is a probability function of the first kind, then the ex- 
pression for the simultaneous distribution of x, and %, 1s 


(2.1) 
7727 
line Be Sal G WoL aM big 
aaa 64-19 LG ny A tyge! 


zt Az ac 
X Loa, ate Weg gt! ey i hoy 


where by i = ) is understood the number of combinations of c 


items taken @ at a time. 
Proof. Let us temporarily require that 4 2* %, - We 
shall show later that this restriction may be removed. The prob- 


ability that x, and x, as defined contain 4-2 /g-7 12 ---4,) 


elements in common is ( tie MH "a a ( 72 ) i 
fag 7 kez 
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The probability of the occurrence of any given pair of values 
(%,, % x, ) that is, the probability of a point falling into a given 
rectangle, (%> X%,+ AX,, X%,, %,+ Ix, ), is the sum of the prob- 
abilities of all of the neemniy exclusive ways in which it can 
occur. Each of the terms in (2.1) multiplied by 2, ax, consists 
of the integral, (derived by the method of Theorem I), which is 
the probability, to within infinitesimals of higher order, of the 
occurrence of a given pair, /x,, x; J, with a specified number 
of values of ¢ in common, times a coefficient which is equal to 
the probability of the occurrence of this specified number of 
values of # in common. Each of the terms as a whole, then, is 
the probability that the given /~,, x,)will occur with a specified 
number of values of # in common. Hence, the expression (2.1), 
being the sum of the probabilities of all of the mutually exclusive 
ways in which x, and x, can fall within the desired rectangle, 
is the aw that this will occur. This establishes the theorem 
when 4,, 2 Ay;. 

If Az < AZ, , then the maximum number of values of ¢ 
which x, and x, can have in common is 4. The expression 
for F’/x,, x,) in this case, then, consists of the sum of all of 
the terms of (2.1) beginning with the term where ~, and x, 
have 4, values of ¢ in common and continuing to include the 
term derived from the case where they have no values of ¢ in 
common. Equation (2.1), however, in its present form may be 
considered as a correct formal expression for the correlation sur- 
face even when 4, < 4& 


23 
of the terms where x, and x, are to have more than 4,2 values 


, since in this case all of the coefficients 


of #¢ in common are zero. This follows from the definition 


(ay: O if c<ad . Thus 


4 ‘a 5 
(*) . (,, 4): Sonn is :,)- O if 42 < oo 


Hence, we may now remove the restriction that 4,, 2 4, . This 


establishes the theorem. 
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We are now in a position to write down the surface 
v= Wx, x,, x,). 


It is given by the following ayia where, by 4 
is meant any g values of the tz 


Vy) Mie YY 8 ange ng 


XV (4,-b,--- ln 2K 01 Be FOG ig os We ) 


gel hy 
hog GV Sh, 


la “+ té 27/ 
Ae WN hag BE 2a hay gt! Blog Steg! Bap? 
Lb 1 Le gor Se 


Theorem III. The regression curves of ¥, on x, and of X, 
on 2, for the correlation surface w=F; (2, , %,) , defined in 


Theorem II, are linear and are given, respectively, by the follow- 
ing equations: 


(221) 0 Aa “eg *) , 70 "ose Sag IM 
7 11, % 7 . 
and 
(2.22) xe Se laa%s  CrMehiahag) (7 
/ 2 73 772 


where 7 is defined as in Theorem I. Further, the coefficient of 
correlation between x, and x, is 


Ayo &. 
(2.3) rc = _#3 


t,t 7 (7,1 )E MX 2 eX" 
Proof. As in the proof of Theorem I, we set up the expres- 
sion for the locus of the ordinate of the centroid of a section of 


the surface for a fixed x,. We have 


¢ao 
a 28 —— Jak, 


where 4” (x,, x; ) is given tw "(2. 1). From the definition of a 


&,= 
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@ 
marginal distribution, we know that L F’(x,, 2, Jalé, reduces 


to (1.11), the marginal distribution of x, . Let us now write the 
expression for ~, as the sum of 4,,+/ fractions. Thus 


on ack (4) "sf L Lon 


KG (Apt b, hes F 2 Do hag tl bh 44 op hong 


A hy ys laos Cy yy get Bangs te 


Hereafter, we shall call an expression of the form 


(EN Ne) 


a “probability coefficient.” Then (2.4) is the sum of products, 
each of which is a probability coefficient times an expression 
which is equivalent to the expression for %, for the simple case 
where x, would be derived directly from x, by the drawing of 
k,-g values of # from x,. These latter expressions, by the 
application of Theorem I, may each be written in the same form 
as (1.3). Hence, (2.4) has been reduced to 


a yen LM | heue| /( ) 
AES temo fy) 
“fz “lest ) ‘ /(2) 
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ra [total bs le) (54 


fe (e,) ()s) /(2) 


By the use of a well-known theorem of Je] [| analysis,® we 
have that 


=> Ate 7 aol (ee (2,0) et 
11, 99° \‘as-g 1, \fgz/ Kes 17727 
and 


(3- se (, (7 (= *){(z) My (2) ) tate, art 


[| 


E(t) elle bowen (ee) (2) 


which eae, to 


(2 - h,)( Me jo = (7-4; hae Vhay | 4 


by the same wena of cabtinenmey analysis. 


Hence, (2.5) becomes 
= hig 4g3 %; (12 729 -4n2 $2a97/™ 
—_————— ~  * 


Kae 





~ 2 “23 
o° 4% " 
In exactly the same manner, we may show that 
— Ave hey Xx 014, Mg Iiyg %qy I I 
x=. ———————— — —  — .. 
Tz 7y ™, 


We then obtain the coefficient of correlation from the slopes of 


these lines. It is 


_ “iz “as 
(2, %3 “1,07, 747% * "%,%e 


This completes the proof a the theorem, since 


r 
ty 
f = Kuz f° = fas 

6* > te na To * 
+, Xe Cn, 2,7 29 (1, 7;) 


3. The correlation among £ sums. We now extend our 
discussion to principal variables, forming each successive one 


®°F. Netto, “Lehrbuch der Combinatorik,” (1901), pp. 12-13. 
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in the same manner in which %, and x, were formed above; that 
is, %/, (é=23 -----p/, is equal to the sum of h,.,, ¢ random 
drawings of ¢ from the constituent values of # forming x,_,, 
plus the sum of 77-4, , independent random drawings of ¢ 
directly from the universe characterized by ¥(#) The correlation 
surface, w= Ff" (3 Xs Xp) , can at once be written in the same 
manner as the surface considered in Theorem II. That is, each 
term of the expression for A’/x,, Mads multiplied by 2x,, 2x, 
consists of an iterated integral which represents the probability, 
to within infinitesimals of higher order, of the occurrence of a 
given pair, (2, ,%y ), with a specified number of values of # in 
commen, times a probability coefficient which represents the prob- 
ability of the occurrence of this specified number of values of # 
in common. This same method may be employed in writing the 
correlation surface for any pair of principal variables. The ex- 
pressions for the probability coefficients, however, become increas- 
ingly complex as the number of ways in which the two principal 
variables can have OC, £2 ---- values of ¢ in common increases. 

The following theorem can be proved by mathematical in- 
duction. The proof is not difficult, though tedious, and on that 
account will not be presented here. 

Theorem IV. If F(¢/is a probability function of the first 
kind, and F (X,, tp? is the simultaneous law of distribution of 
x, and x,, then the regression of x, on x, and of x, 0n x, are 
linear and are given, respectively, by the following equations: 


(3.1) ge sy A 


11a Pipes MM Pipe 
ie 4» Mog**** Ap- 7, Tn *** Tims = Men Noa*** Sod 
(3.2) Z, = et) Pe ge , "2 fo-1 ~ Kia "e3°** "P-L Png 
72z 27 Ip a, 


Further, the coefficient of correlation between x, and x, is 


Iz 4a3°°°* Xp-1, 2 
(33) My x2 BO or le pty: 
2g y°°* Mp (4 Mg It ’ 2“3 Pl" p 
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II. Sums of elements drawn from a universe characterized 
by a probability function of the second kind. 

4. The correlations between two sums. Let £/¢/). a prob- 
ability function of the second kind, characterize the distribution 
of the variable 7. Let the principal variable x, be defined as 
the sum of 7, independent values of 7 drawn at random. Fur- 
ther, let the principal variable x, be defined as the sum of 4, 
random values of the 7, values of # composing ~,and of 73-4, 
independent random values of 7 taken directly from the universe 
characterized by f(¢/, 

Theorem V. Given the sums x, and %, as defined above’ 
with 4, random elements in common. 

a) The marginal distributions of x, and x, are given, re- 
oe by 

Una 


(4.11) Gf f- =a £04) Fb, ) 


X64, -by- by pp Dh np LE 


444 


and 


(4.12) Gz)e/ ”. Agri" mp2 
xi,/j/= coe 
. a" "2 
o 0a °o 


X thy) Alby, Mle) Wha gy) 


N F(xig-b,---- by hig 28 M00 “by. I Lb», Eg 47 Lig” ~ &,. 


7 1) 


b) The correlation surface, w=F/z,,x,), which is in two 
distinct parts joined along the plane x, - x 
(4.2a) 


-b, Hh pct nb, by pty, 
“2 7, ww ip 42 
Ff, (4, % ‘, )= fy L J 7 
7 () 
ti, 7 


‘2 = O, is given by 
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XG (%, , bn Chie’ by ig o/s" 


2% ten ng Ae 


27 


Xd pyos By nap > Lh 


(% $ x, < 0); 


(4.2b) 


%,-4,-"°~ by n- 4 a - “ik 
E, (%,,%,)+ / af 7" 
avg oO Q 


| —_—— tins 8s ae? 4 ne 
° 


K S%, bb, np. PP qb ue Yn? ok, Mgr!’ wh yt] 
VM apt Ce hyot Ce np Cie 


(%, 5% <0) 


c) The regression curves of x, on x, and of x, on x, 
are linear and are given, respectively, by the following equations: 





- 
(1.31) %= Z~ 4 /n,-4,)M, 
and 
(1.32) «B72 + ('2,- hig) M, 


where of? t t(t) ett. 


Hence, the cofficient of correlation between x, and %, is 
Ai2 
, %, = (7, %)t a ) ’ 

Proof. The proof for the marginal distributions of ~, and 
of x, are, given by Craig*® and need not be repeated here. The 
expressions tor the correlation surface are derived by a simple 
extension of the same method to two independent variables. The 


Allen T. Craig, loc. cit., p. 356. 





118 CORRELATION SURFACES OF SUMS 


limits of integration may be easily verified. 

As in the proof of Theorem I, the regression of x, on x, 
is given by the locus of the ordinate of the centroid of the section 
of the surface for a given x,. However, as the surface here is 
in two distinct, but connected, parts, we have two terms in both 


numerator and denominator. The expression for 4, is 


x, @ 
ft tig (i (%y, Xd, of x B (x), x, )diz 
(4.3) zg =~ , 
"a, ty Milt ff FE (2ty, Xp Jd, 
° 
where (%, ) and - (%, X> ) are defined by (4.2a) and 
(4.2b), respectively. 

In the paragraphs immediately following, we shall be con- 
cerned principally with interchanging the order of integration, 
with the accompanying changes in the limits. It will be convenient 
to write the differential immediately following its respective 
integral sign. Consider the first term of the numerator. Suc- 
cessive interchanging the order of integration between inte- 
gration with respect to %, and with respect to 4, ,%2,.-. “hie , 
respectively, and making the — changes in the limits, we 


get, writing g, for G(x Xz, oj °°" Liha" oe te ns A 


Byte ~% het * 
(49) [aft vf ty, ff at, 
ar 


2 Lytbgt* +t ™ 


Hb > ~ %,-ty-"-4 n-2 Xe ti **-kih,, 
Zz i at, aif Ph p01 
o 


oa 
et ting te gl ere 
f _. 


Qo 


Now consider the second term of the numerator of (4.3). As 
the limits are constants with respect to the variables of integration 
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%;, Be gxicae Lin, 2 » we may interchange the order of integration 
successively until we have 


(4.5) 


Xiph, prt bags) 
[af ef 
1a, 
oo- Gb" , 
4a 
on*** 
(4 


apts Ea gt! 


Wt, +2°°** 


We may now combine the first and second terms, (4.4) and (4.5), 
getting 


1% HG Ey bagel - 
[ [a ef ay, | o%2 


tythagr tly, 


“hy ty nd 
“nt at, at 
3 


tu tihyg” Be, hag?! 


As the limits of integration are constant with respect to the 
variables x, and 7% fygtl o> t, n-7» We May at once interchange 
successively the orders of integration with respect to #, and with 
respect to Le hg its 4 hight? 0 ae) Mgt respectively, making 
the proper changes i in ‘the Simite. “We then have 
(4.6) 


4% b 
ZF, - . b ue 
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The denominator of (4.3) may be reduced to this same form 
except for the absence of the factor 2%, in the integrand. 
Let us make the transformation 





ve oo Chg” 22 hygel "ta 1-1? 


as was done in the proof of Theorem I. The limits 7,+---# 2 n- / 
to eo on x, now become 0 to oo on v . We have now re- 


duced (4.3) to the following form: 
(4.7) 


x, x, p@ 
%2 = dy é Lf Lar: 
, x, p@ 
[f+ nat ws te, her * ofa 
o ° 


rs 7! 
+f J ¥] Se, jig! ty )4(e) deat, py 


o °o 


nae LE, ny-4 ~~~ Lhyy y/ 


{f-[ 46,77 a thy yl tM)avdt, pdt 01 Mi pp Lt . 










» Tg 4/ 





The denominator reduces at once to G/x,/)in (4.11). As in the 
proof of Theorem I directly following equation (1.6), it will be 
noted that the terms of the numerator fall into two groups: those 
kg terms containing the factor 4, (¢=1,2,. .» Aig), and 
the 7,-4, terms containing the factor v orZ, , ry» ng -L) 
As the limits of integration with respect to Ls - ane letter 
variables are O and w, and since complete interchangeability of 
the order of integration is then permissible, it is readily seen that 
any two of these 7,-4, terms are equivalent. The sum of the 
entire group, then, may ie written 

(4.8) 


Z, wk np’ a8 
(75- tof 2 if - ee ee coee 
a, wo ve, ie / tt) fv) 
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In (4.8), it is clear that the integrations with respect to each 4 rj 
may be effected immediately by making use of the hypothesis 


that f° t(t)dt-1 . This leaves vt(v/ @xz, remaining as 
the integrand. The f v Fi av is a constant which we shall 


oO 
designate by 47. Removing this constant from under the integral 
signs leaves us merely tne expression for the marginal distribu- 
tion of , times —* We then have 


fate, fet; be Th ,, Fle; tle) 
(4.9) ¥,=(m-h)M% Z* z gen acer 
‘La M1 


That each term in the summation in the right member of 
(4.9) is equal to any other term in the summation, follows from 
the complete interchangeability of the order of integration of any 
two consecutive variables, provided a corresponding interchange 
between these two variables is likewise carried out in the limits of 
integration. By successive interchanges of variables we may put 
the original 4, , 4, ,--.... Lik in any order we choose. Hence, 
the sum of the last 4, terms of (4.9) may be written as A, 
times any one of them. For definiteness, select the one containing 
the factor 4, in the integrand of the numerator. We may now 
integrate out all of the 4; , (= 4,,+4....4-4) and the v exactly 


as before. Equation (4.9) then om! 
why 


fa. f* 4/4 b, Gx, 
%,° (Nz- hig) M+ hz 


oa 
“2 4 77,-4 
[& ff at,, Ox, 


or %,=(1g-hgIM +H 4p 


' x 
It is not difficult to show that J, =+- . Hence, we have 


5. Maer 
ay <= + (12-4ig ) M. 


In exactly the same manner, we may show that 


/ 


- , “a2 
x, = a +(17,- hig) 17. 
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The coefficient of correlation between %, and Xz is 
4, 
r -—~_, 
%1%2 (17, 72) 

which completes the proof of the theorem. 

Illustration. Consider the two sums, %, = 4, * 42 , and 
x, = t, + €,, ,with 4,, #2, #22 , as random drawings 
of ¢ from the distribution characterized by the function f(t): 6* 
for # on the range 0 to oo. From (4.11), the marginal distri- 
bution of 4%, is 


G (x,)=x,e°™ 
Similarly, the marginal distribution of 3 is 
The correlation surface, obtained by applying (4.2a) and (4.2b), 
is 
Fi (,,x,)= e ~(1-e°*2), (Of x4 x,); 


and 


Fy %,,2,)2 08 (f-0°%) (a, x, < 00) 


5. The correlation among more than two sums. We shall 
state, without proof, the following theorems. 

Theorem VI. Given a probability function, f(t) of the 
second kind, and three principal variables, x, , x2, %,, defined 
as for, Theorem II. Then the correlation surface wz F(Z, x,) 
is given by 
(5.1a) 


1 by - hy G4 eg) 
4B Oph) & * (fe etd fa tif tas 9 
(72,) 9-2 er" 


Athy Pe 
7 ws ar hs : wo / ve 
4e2F , 


61%,-7 
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By hm “4 gop” ot ae Gt Lee, 
Bly, 4554! 
o 
ay ty By rgd 
£ Me ng! Fe, $C tn a agg 


“7 
iy > "bey 2, dissanall oo “be ho tog a Ea mp2 
a 
° Say/ 
Ox, P(%, t, “s** Ios rH? “24, perl by ee | 
(%, # %< co), 
Theorem VII. 


The regression curves of %, on x, and of 


%, on x, of the correlation surface in Theorem VI are linear 
and are given, respectively, by the ‘ae oS es. 

4, 4, 12 - na has 
(2.21) z,* a 4 
and 


eg Mpg % "0 Kae “en IVT 
(2.22) mm 
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where M is defined as in Theorem V. Further, the coefficient 
of correlation between x, and x, 1s 
(2.3) aye poe "Mt, Me "X2 %~3° 

Theorem VIII. The statement of this theorem differs from 
that of Theorem IV only in that 4/4) is now to be a probability 
function of the second kind. 

III. Sums of elements drawn from a universe characterized 
by a probability function of the third kind. 

6. The correlation between two sums. We shall now con- 
sider principal variables defined as the sums of values of ¢ drawn 
from a universe characterized by £/#/, a probability function of 
the third kind, defined on the range 0 to @ . and with 


J Ht)dt =f. 


The correlation surfaces are not developed with the same degree 
of generality as were those in the preceding pages because of the 
tediousness of the labor involved and the complexity of the cor- 
relation surface, which may consist of many sections joined to- 
gether. Thus, if x is the sum of 77 values of ¢ and y the sum 
of 7, all drawn from a universe characterized by a probability 
function of the third kind, the correlation surface, w=//%, y, 
consists of 2/mnZ)sections, each having its own equation. Hence. 
only the case where x and y each consist of the sum of two 
values of 7 , with one of these held in common, will be considered 
here. 

Theorem IX. Let A@/, a probability function of the third 
kind, characterize the distribution of a variable ¢ . Let the prin- 


cipal variables x and y be defined by the relations x= +¢,,+t,2 , 
¥* 4, + t22 , where t,, t,, t,, , are independent random 


drawings of ¢ from the universe. 


a) The marginal distributions of x and of y are given by 
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(6.11) : 
Ghd fi Wi) Me-tdt (0 $x¥ a); 
fe) t(x-t) dt, (a $x 2a); 
and 
(6.12) 


¥Y 
Gi) -[ Hi) tly-at,  (0%y 8 a; 


a 
-/ i) tly-tdt (asy &éa). 
‘Y- a 


b) The correlation surface, w = F/x,y) , is given by 
(6.2) 


Y 
Poy) f #4) tlt) tly-t)at, (Osy sesa); 
ft) tet) f(y-t)dt, (O£ xsysa); 
x 
-/ f(t) f(x-t) t (y-t)at, (as ysxrasza): 
y-@ 


- [#0 tle-t) # (y-t)at, (Osx-asysa); 


#-a 


-[ “ ttl<-t) t(y-t)at, (asy sx% 2a); 


= I FO) (e-t) f(y-t)dt, (asx sys2a), 


In a) and b) above, the subscripts have been omitted from the 4%, . 
c) The regression curves of y on x and of x om y are 
linear and are given, respectively, by the following equations: 


(6.31) y= erm, 
(6.32) and Ze term, 
where Mz f “(bet 


o 
Hence, the coefficient of correlation between x and y is 4 
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This theorem is a direct generalization of Rietz’s paper in 
Biometrika cited in the introduction to this paper. The proof may 
be supplied by the reader. 

Illustration. Let us consider the rectangular distribution 
given by *(t/)= Z , for 7 on the range O to @, and a to0, 
This is the parent distribution in Rietz’s case when @= 7 . From 
(6.11), the marginal distribution of ~_ is 











G, (x) « =. (OF xa); 
(<a-x) 
a, (2a $x Za) 
Similarly, the marginal distribution of y is 
Gp Oy) = 2 (O04 ye a); 
@a-y) 
= ay, (a2 = Y $ Za) 
The application of (6.2) yields 
F(x%y)< 25. ( OSysxa/); 
e = (Osxsysa); 
-VY#2 
ne . y (asysx+as 2a): 
a 
(Y¥-xX+*+2 
am =3 } (Os x-@8 ys a): 
(2a-x) 
> (asys x 2a); 
(2a-y) 
os (asxs ye 2a). 


These results, obtained directly by the use of Theorem IX, agree 
with those obtained by Rietz in the above-mentioned paper. 


Cak WH. Freche. 








ON THE CORRELATION BETWEEN CERTAIN 
AVERAGES FROM SMALL SAMPLES* 


By 


ALLEN T. CRAIG 


1. Introduction. It is well known that no correlation exists 
between the arithmetic mean and standard deviation of samples 
drawn at random from a normal universe. However, there seems 
to be in the literature no treatment of the correlation between 
other averages either for normal or non-normal universes. In the 
present paper, a few simple theorems are established which make 
possible the determination of the type of regression of the median 
on the arithmetic mean, of the range on the median, and of the 
range on the arithmetic mean. In case the regression is linear, 
the coefficient of correlation may be computed. 

We shall understand a probability function ‘</ of a real 
variable x to be, for all values of x on a range of © a single- 


valued, non-negative, continuous function with [ FCa) dx = 1, 
e ‘RP 


Then / f(x)dx is the probability that a value of x chosen 


a 

at random lies in the interval (2,5) where g and & are in 2 
and @<4; and 4(x) x is, to within infinitesimals of higher 
order, the probability that a value of x chosen at random lies 
in the interval (x, x+@x). It will prove convenient to classify 
probability functions according as 7 is the range (co, co), (PE /, 
or (20. 4) &#>O. In accord with this classifications we shall 
refer to probability functions as of the first, second, and third 
kinds respectively. In a similar manner, we define a probability 
function *’ (x,y of two independent variables. 


*Presented to the American Mathematical Society, Dec. 28, 1931. 
1Cf, L. Bachelier, Calcul des Probabilités, p. 155. 
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2. The correlation between the arithmetic mean ZY and the 
range W. 

Theorem I. Let 4%) be the probability function of the vari- 
able x. Let 4V%,VW/be that of the arithmetic mean % and the 
range W in samples of three independent values of x. If F(x) 
is a probability function of the first kind, then 


54s W 
wet 
B8,W)=18f ~ Flu,) Fa W) f(3%-2x,+Wate,. 
e+ ¥ 
Proof. Let x,, XX be the three observed values of x . 
Write 
Hy +X, +X, = JX, 
x, “_* Ww, 
xX; = x, s X,. 


For x assigned, -00< X< oo , and W assigned, O ¢ W<co we 
must have 


Let 


The absolute vaiue of the Jacobin is 7. Hence the theorem. 
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In the case of samples of four independent items ~,, ¥,, %,, 
%,, the ee function 4 /x,W/is given by 


> 


F (4 W)= sof le ) tli, ) H42-2x,-x,+W)t(4,W) dx, dx, 


a+ 4X- Sx, + 
ee 4X -3%,+2W 
ed eit )t(4%-2x,-x,+W)t(x,- Wax, ax, 


We note that the probability function is made up of the sum of 
two parts depending on whether ~, is in the interval ( x + w : 
2 Yor i in the interval (z X+ ¥, Xe ) Moreover, it may be of 
ae to note the overlapping of the ranges of integration of 
%». . To prove that 4/2, W7is given as stated, we take 

M, + Xz +X, +X, = 4X, 
(1) XS Xy, Myx, 

%,~%,=W. 

From (1) it readily follows that 
(2) ax, +X, +%,= AX+W. 
For assigned values of % and W, the upper limit on x, is found 
from (2) by taking x,-x,=~,=x,-W. Thus x, -z+ sy 
Similarly, the lower limit on x, is found from (2) by taking 
% = %, > x, . Thus x, -x% + x . But x, may not always 
be as large as ~, for all when of x, . This may be seen by taking 
%,= x, and x,-x,=-x,-W in (2). This leads to x,22+¥ 
Thus, for x +3 <%,<sX%+z , we see that x, is the upper limit 
on x,. To determine the lower limit on x, for this region of 
variation of x, , we select x, as near x, = x,-W as is possible 
without causing x, to exceed x, . But x, = 4¥-2x,-x,+W . 
At most, then dz- -2Xy -%yt w- x, OF %=4X%-J3%,+W. 
Thus we have established the limits of integration used in the 
first part of the sum of which 4/%, W/ consists. A similar argu- 
ment shows if z+ ¥< xX, 2 X+ a, that 


m-Ws xs 42-3x%,+2W. 
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If f(x) is a probability function of the second kind, we 






observe in samples of three independent items x,, x,, ~,, for % 
assigned, that O< Ws3x. If Os W * 32/2, we have 







~— 


z+ Ws x, = a2}, 
wy = IZ- 2x, +W, 


x= x%,-W 












and if oe <W ss Jz , we have 
Wsx,s z+ 2h 


%, = IX-L2x,+W, 


%, = x,-W. 












Accordingly, 


A 


F(t, W)-18f , "Wa, )4a,-W)#(3%-21,+Woae,, O#Ws = 


"" « 
ef ee ews3%. | 


In samples of four independent items ~x,, x, , x, ,% , drawn 
from a universe characterized by a law of probability of this kind, 
we find 

xe ¥ x; 
ElW)=48f yf  tl<,)tla,) 4(42-2x,-x,+W)t(t, Whitede, 


po 4X-F%,+W 
x Y 4X-3%, +2W 
-46ff yh $(x,)¢ (1, 448-2, -x 1, +W)t 4, Wax, ax, 
are. 


O<sWs = 
+¥ 
-46f ik tx,) tl, HAA “Lk, -H, +W) tx, -Wdz, Lz, 
Ww $L-IX; Ww 
ao 4x- - Fx, +aW 
of a J [ Fi) lg) £048 26% WH W) cy ee, 
Ys Ws 2, 
IW 42- Fx, +2W 
“sof ft F(xt,) F(t) F(A -2x, -%,+W) t(x,-W) dx, dx, 
%,- J 


LxXsWs 4x. 
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Finally, consider £/x/ to be a probability function of the 
third kind. In samples of three independent items x, x,, x, , 
for O< <k/3 , we obtain O<Ws3 ; fork/3< %<2h/3, 
we obtain O< W< «A  ; for LALF <Z%<k _, we obtain 
OsWs 3(k-X) It is fairly easy to see that for X andW 
assigned as indicated, the following regions of selection of x, are 
valid : 
for Os x2h/2 and O< Ws 32/2 
or for A/2<%<k and Os Ws H4-z)/2 , then 
Z+W/3sx,< 2+ 2W/3; 
for Os % < &/? and 3z/2 < Ws 3, 
or for Af/7<%<hk/2 and F¥/2< Ws Hk-x)/2, then 
Wsx,< % +2W/3; 
for 24/3 < %<k and WH4-%)/2s Ws H4-2) 
or for A/2 = 2s 2k/F and G(k-2)/ 2 = Ws 22/2, then 
H+W/Fsx,shk; 
for 4/7 s %< A/2Z and Hh-%)/2 s Ws k, 
or for 4/2 s%<2k/F and 32/2 sWsk , thenWsx,<h. 


Thus, 


za 


JI 
_ ~F (%,W)= ef mae ) t 0 x,-W) #( 3%-2x,+W)dx,, 
xt 
¥ 


z+ J 
= ef K(x,) tl, -W) 3% -2x,+W)dz,, 
Ww 


ke 
=18 ee +Wax,, 
z+ 


4 
=18 lh §l«,) ¢(,-W) t (32-24, +Wae, 


over those regions of the #W-plane indicated above. 

In case of samples of four independent items x,, x,, x,, 
x, , drawn from a universe characterized by a probability function 
of the third kind, for Os % =< 4/4, we obtain Os Ws 4% 
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for 4/4<%<3k/4 , we obtain O<W; for ThA < Rek , 
we obtain Os Ws4(h-2). Let us denote as follows the regions 
of the%W-plane bounded by the given lines: 


X= O 
cfs 4z 


7 
cov = 2(k-x%) 


3 
t(4-x) W = 4(4-x) 
aie) ue 
B =2X¥ F),W = ns 
' faze ©) w= (hz) 
W=2z% W = 2(k-z) 
fw 2. x) ”* * 
- (G)\w = 
oa 4% WV - PY, 
a — 
(D){W= os W = & ! 
4(H-X x 
We ttre (H W aa = - 
Further, let W = 2h-%) 


O= F(x) t(x,) tH <,-W)t(42-2x,-x,+W) 


ba — 
J [eezax-( ) 2 
a @ec 


It is then not difficult to verify that 


saw0a| (4 x, Je (2 4x - Jz, “ye | (A) 
ZW 42-32, +W, +e 


z+¥ %, eo SE 4AX- ISX, +2ow 
=48 or Ww , (B) 
w 4% -3x,+W, ¥+>F z,-W 


and let 


a on , 
%+F 4X Ix, +W eI, (C) 
w x,-W 


AL-Fx,+2W 
* lege ah © 
(<2 42-3x,+w) (28 -" 


zi 





i a 
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-46| (, iw 4z- a oe . (2) 
- (eF 4z- — Je. ” F) 
i (@ 4z- ~se (G) 


_ i er - -_ 


As illustrations of these theorems, let us find the correlation 
between the range and the mean for universes of specified types. 
Example 1. Let f/xJ-e,% Os x< 0. 
For samples of three items, we have 


Fi lz,W)2 6We™* osws ¥, 
- 18(2-We? 32s Ws 32. 


The distributions of the marginal totals of W and Z are obtained 
by integrating 4/2,W/with regard to Z and W respectively. We 
readily find 


gle)- 272", — OxzxX%< 0 


? 


and 
¥(W)-2e*"fe%l), OsWee, 


as previously given by the writer. For £ assigned, the mean 


of the array of W is Wz = ¥ . Thus the regression of W 
on x is linear and r= WZ 


2American Journal of Mathematics, Vol. 54 (1932), pp. 359, 366. 
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Example 2. Let 74(/%/= 1/k, Os x< hk. 


For samples of three items, we have 


* OW 
fk, W)« As’ 





= 6 (2-©), 
8 (4-2-¥) 
2 h-W) 


over those regions of the XW-plane indicated above. The mar- 
ginal totals* are distributed in accord with 








5 ee ie 
P(x “343” O<%FF; 
oe ~- 627+ 6h42-k? Koga e 2h 
2k3 eo 
a ek, vcs hk 
“Tht, Fsask, 
and y (w)- Ow WUh-W) OfWs k. 
We readily find 
W-F. O<2<, 


°Cf. H. L. Rietz, On a Certain Law of Probability of Laplace, Proc. 
Int. Math. Congress, Toronto (1924), pp. 795-799, 
Irwin, On the Frequency Distributions of Means, etc., Bio- 
etr ika, Vol. 19 (1927 . 225-239 
“ 'P. ali, 9, Dice etion of Means for Samples of Size N, Bio- 


Hal . 
metrika, Vol. 19 (1927), pp. 240-245. ’ 
. Neyman ‘and E. S. Pearson, On the Use and Distribution of Certain 
Test Criteria, Biometrika, Vol. 20 (1928), p. 210. 
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_SAP- 27 FR + 2TKE? Ok 2k 
Ok*?-G6hkz+96n2 JF *> GF’ 


= 3 (4-2), Sock. 





Thus the regression curve of W on xX is continuous, but the 
regression is non-linear for ; <xX< Ss 

3. The correlation between the arithmetic mean % and the 
median € . 

Theorem II. Let F(x) be the probability function of the 
variable x . Let /(%, &)be that of the arithmetic mean % and 
the median & in samples of three independent values of x. If 


f(x) is a probability function of the first kind, then 


Fi (%, €)=18 ‘ef ‘ t(x,) 09% -F-u,Jadx,, Esk, 
FX -2 
181 ” tle) H3%-8-2,Jdu,, 22. 
¥ 


Proof. Let %»% 1% be the three observed values of x . 
Write 
XH, +, +X, = FIX, 
— ¢ 


%, 2%, = % 


7 
For % and & assigned, §< % , we must have 
FK- LF € X, < 0 
Wye F 
x= IX- EF -H,, 


and for x <€, 


&s %,< 00 
=F 


Ny = FE -F-X,. 
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If we consider all possible arrangements of ~, , x, , x, , we have 


Bledaaas -6rEag fhe) tig) de, dey, E43 


Bag | He) HeJaxdy, 256. 
f 


The change of variable x, = 3%-£-x, establishes the theorem. 
In case of samples of five independent items xz, , x, x,, 
%, ,%, , the probability function 4’/% ¢/ is given by 


R-4¥ co 


El. sescorty [ [ ee WW5R-F-x, 14, %, i, de, ae, 


Cit 3F-4 St fx, 


00 oo ~ 
1501(¢)f | the,) F(x, ) (2, tS 2-9-1, x, kily te, de, &<%, 


S2-49"°P SR -2F -14,-%, 
00 pF 
=L50(¢) / Fle) (4, )t (2, Wl Tie-$-%,-%,-%, Jala, ae, de, zs. 
= SR oP aig 


This follows immediately from the fact that for % and 2 as- 
signed, ¢ s % , we may have either 


Os 4, <52-4F 
JX -IF -x%, $ %,< 00, 
J% -2F -%,- 4%, <4, <6, 
x, 
4,* J8-$-%,-%,-%,, 
or 
Jz -4F < x, <00, 
fs Xz <o%, 
TR -2E -%, -%, 5%, 5 
%-F 


ee ae 
a 





SO Ey ES 
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and for x < & , we must have 


&< x, <00 
= X%y< 00 
SH -2F-x %, < %= 9, 
°F 
X= IR -F- x, -%,- %, 


If 4) isa probability function of the second kind, it is clear 
that O< &< Ze in samples of three items. Then 


3E-£ 
Ez, yiatey] Ka, )HF#-¥-%, Jax, Os & < &, 
bear 
IE-¥ . 
0) f Wx, )1( 5% -F-x,)dx,, zee << ¥. 
~ 


In case of samples of five independent items drawn at ran- 
dom from a universe characterized by a probability function of 
the second kind, 4/Z¢/ can best be expressed in a form employ- 
ing the notation used previously. Thus we write 


$= He) t(x,)t(4, )H52-F- x, -2,- x, ) 


u,, = J#%-iF- XL, -~Xy---+- ~%;, 


oe? barf 
/// fazaxag-(_ | 4 
Then 
(yy 42, F “qo “: 42 
E(4,$)=1501E) ( z Je ( ) g 
fin uy & ( “yg tt a g 
Lo £ a)? L, “zy oO 


“ig “13 “12 " 
7 (“ £ ) 4. eae ms 


1 0 


and 
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= 1500/2) es a * z je i: ‘mn; .) g 


{| ““o “4 “e) ¢ | sds Fz 
“Go F 9 + 


-zsorte)|( 2 s |, Yrs - 


Finally, consider */x/) to be a probability function of the 
third kind. In samples of three independent items, for Os 2<4//, 
we obtain O< 2 < I) s for 4/7 s =< 2k/F , we obtain 
(32-hjlesf'e 92/2; tor 24/3 <%= k , we obtain(Jz-KJlesf'sk, 
It is not difficult to verify for 2 and £ assigned as indicated, 
the following regions of selection of x, are valid: 
for Os 2< A/F and Os £s Z, 
or for 4/7 < 22 k/2) and P¥-Aks Ss , then 
SZ-29 3 x,< F2-&; 
for A/?< 2<k/2 and (Fz- Weeks Ji-k, 
or for 4/2 <Z< kA and (Pz- A)/ 24 & < Z , then 
J2-2$ 4%, <4} 
for Os &£< A/2 and #25 & s F2/2, 
or for k/2 < X< 2k/ 3 and Jz-As ¢& < F%/ 2 , then 
sx, 5 F2-£; 
for 4/2 <z%<2h4/Zand %s & < ZZ-4, 
or for 24/3 = #<ek and *#< F < A ,then &< x, <4. 


Thus 
32-2 
Ey (8 F)=184C0E) | Ha) t3%-F-2%,) de, 
I2-29 


& 
218 ref Fle, ) 4( 9 2- F- x,) Le, , 
IR-2F 
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92-2 
- 18 elf $(u,) 32-¥-x, Jae, 
e 


A 
“18 ref 4(2,)t(32-¥- x, )dx,, 
a 


over those regions of the #¢-plane as indicated above. 

With samples of five items, the correlation surface is defined 
in so many parts that we shall not take the space necessary to 
consider it. 

As illustrations of these theorems, we shall find the correla- 
tion between the median and the mean for universes of specified 
types. 

Example 1. Let */v)-e°*%, Osx <0. 

For samples of three items, we have 


Ei (2,F)- 188e 7¥ Os¥s 2, 
= 18(32-28)e 77% ds Fs 


The distribution function of the marginal totals of ¢ is given by‘ 
a. is 
D(¥)= Ge Y1-e 7 Oz F<. 
For % assigned, the mean of the array of ¢ is 
& Sz 


4 6 
Thus the regression of ¥ on X is linear and 7= age 
Example 2. Let t(x)= Z , Osxek. 
For samples of three items, we have 


6 (a g)-2& 
‘ 2 (h-32+2¢) 


*Cf. American Journal of Mathematics, Vol. 54 (1932), p. 364. 
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~ 5 (32-2), 
A 
‘ s (3k -2) 


over those regions of the  #-plane indicated above. The distri- 
bution function of the marginal totals of & is given by® 


We)= Eee), O08 Fs ke. 


We find 


5 SE 
if =a’ O< Xs $. 
SBIR)? kk gg ZA, 
6a*-2(32-K)@ FF 
(Fz+h) 2h ees. 
. ional 


Thus the regression curve of & on Z is continuous but the re- 
gression is non-linear for £ £X< sf. 

4. The correlation between the median & and the range W. 

Theorem III. Let F(x) be the probability function of the 
variable x. Let AE Whe that of the median & and the range 
W in samples of 2+ independent values of x. If f(x)is a 
probability function of the first kind, then 

fw %, ml. .€ m-L 
GIVE W) oot tof Hx) tf, -W) | frwer| Vf raved ax,. 
tr m-L)!\ ie f % 


-W 


Proof. We have 


5Cf. P. R. Rider, On the Distribution of the Ratio of Mean to Stand- 
ard Deviation, etc., Biometrika, Vol. 21 (1929), pp. 136-137. 
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Hence the theorem. 


If ffx/Jisa —— function of the second kind, then 
-L 


(2m)! > wrk 
E(EW)- (/malF ey i Me W, ~ ex dy, WzE, 


(2 m1)! 
apt of Wx, Jb W [reve 


7 m-L 
ee ax, F<W. 


uw 


Finally, consider 4/%) to be a probability function of the 
third kind. We observe for 7< & < & , that Os Ws. For 
assigned values of & and W/, the following regions of selection 
of x, are obvious: 


for Os £< k/2 ,and Os Ws &, 
or for 4/2 < &< k and Os Ws A-&, then F< x, < F+W; 
for Os &sk/2 and ¥< Ws k-¥ , then Ws x= F+W; 
for Oz &< k/2 ,and 4£-F£¥sW<ek, 

or for k/2< sk and &<W<Ak , thn Ws x, <4; 
for 4/2 < £ < A and A-F «We, then Fsx,< k, 


If we write m-1 ~ m-L 


x%, ¥ ; 
Y-tea)tta,w)| J reat / rivat| ‘ 


e -W 
we have 


lem Ly Few 
5 (GW)-~ 2 ane <m-L)I\* ot” Vax, 


(2m+t)! Fo 
i [(om-1)/|* “ey, rm, 


(2 mrl)! 
“Tim-LI* LV Hof ree, 


(2rm+el)! 
“Vr De Hof vs, 
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over those regions of the & W-plane previously indicated. 
We shall consider two simple examples. 
Example 1. Let fleJee*% Osx <0, 

With samples of three items, 


Blt W-3e eve) wee, 
-Je W844) Fs<W. 


The regression is readily shown to be non-linear. 
Example 2. Let f/x)= Z, Os xk. 
With samples of three items, 


5; (EW)- 





6W 
As’ 
OF 
"eo 
6 

“a Ck Ww) 

s 25 k-£), 
over those regions of the ¥W-plane which have been previously 
given. The mean of the.array of W corresponding to an assigned 
& is wy = £ . Accordingly, there is no correlation between the 


median and the range in samples of three items drawn from this 
universe. 





It is easy to employ the type of argument used in establishing 
Theorem III to obtain the probability function of the median and 
lower quartile. Thus, if f() is a probability function of the 
second kind and 47 /¢ 7/is the probability function of the median 
& and the lower quartile 7 in samples of 4777+/ items, then 


(4Am+l)/ ” | f 2 ” 
Ff’ by aunniemnunemncmue Sid Oe fi tf 
" ey mL m-1)! - of (et Vo at 


{fired - wa ¥. 
Rete /~ 


- 





ON THE DEGREE OF APPROXIMATION OF 
CERTAIN QUADRATURE FORMULAS 
By 
A. L. O’TooLe 
National Research Fellow. 

If fc) be a continuous function of period 27, and if the 
interval under consideration, say the interval from O to 277, be 
divided into 7 equal parts by the 7+ points x,;-2' n/m, “=O l, 
ies ,77, then the trigonometric sum of the 7/order coinciding 
in value with fc) at the »7+/ points x; , or the trigo- 
nometric sum of the 7//order lacking the term in sin 7x , is, 
according as 77= <m+/ or m=2£7, 


1 
G, (4) = 5 a, +2, COS X4+Qz,00S Go+--+-+-- +A, COS mx 
+O sin x 4+Bn PLP rrr eee +Bosin nx 
or 
W=La+2 COS % * Qo COS Lt#---:-: tia COS 7x 
“Uy, “as ™ 7 < a “n 
+h sinx + 6, sin 2x +----4+8,, sin(n-1)z, 
where 
ha err 
Qa e f(x, ) cos Ax;, he =e, 


bp @™ 
2. “77 2 f (x;) sin Ax; . 
If the Fourier coefficients of 4/x) be denoted by 
2r 
Z 
a, 2<+ fp tHe/)cos kx Lx, 
a4 7 f 


a 
Kj, = Z Jf "Ke sin kx at, 
17 Oo 
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then it has been shown! that the interpolating. coefficients a, and 
by are approximations to the Fourier coefficients, and, in 
the sense of the rectangle quadrature formula, in the sense of the 
trapezoid quadrature formula, in the sense of the average of the 
results of two applications of Simpson’s formula, and in the sense 
of higher quadrature formulas. In other words, the simple rec- 
tangle formulas a, and 4, are as good approximations to the 
areas %, and <j as the estimates given by the trapezoid rule, 
the average of two applications of Simpson’s rule, or higher quad- 
rature formulas. 

It is the purpose of this note to discuss certain quadrature 
formulas and to observe some other conditions under which the 
rectangle formula will give as good an approximation as the more 
complicated formulas. 

The most elementary and best known of the formulas are the 
rectangle formula, the trapezoid formula, and Simpson’s formula. 
Many of the more complex rules are the results of attempts by 
different investigators? to improve by various devices the approx- 
imations given by these three simple rules. 

Suppose the area under consideration is bounded by the curve 
y= 4), the x -axis and the ordinates at x=zand x=34. If the 
interval from @ to & be devided into 7” equal® parts, say of 
length 4 , by the n+Z points x~,=-2, %,, %ig**, %_ 7% ZO, and 
if rectangles, each of width 4 and height y,, ¢->Ql2Z ---,”-1, 
be constructed, then the area as approximated by these 7 rec- 
tangles is 


va-L 
(1) AnhE= y. 


v=9 


1D. Jackson, Some Notes on Trigonometric Interpolation, Amer. Math. 
Monthly, vol. xxxiii, no. 8, October 1927. 

2See Runge and Willers, Encyklopadie Der Mathematischen Wissen- 
schaften, Bd. I1:3 (1915), pp. 45-176. 

8Discussion from point of view of least squares, Otto Biermann, 
Monatshefte Fur Mathematik Und Physik, 14 (1903), pp. 226-242. 

For unequal intervals see Jas. W. Glover, International Mathematical 
Congress, Toronto, 1924. 
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To find an expression for the error we assume the first deriv- 
ative exists, so that for the first rectangle 


t(x)= Fla) + (x-a2) f Cu), 
a+h 
/ KaJax= htfla/+ = *f (2) a<z< arh. 
@Z 
Hence the error for the 7 rectangles is 


(le) e-2° £ f(z, je S at 4'(2),@<z<8 





i.e., an error of the order of 4 , 

Let = 774, 4=1,2,3-:-. If we approximate the area in 
the first 4 subintervals by a parabola of degree 4 coinciding in 
value with 4/x/ at the first 4 values of x , then integrating 


Lagrange’s interpolation formula an expression for the error is 
obtained. If A is odd then 


AY A+e fart ) 


E <6 Fz) soe , H= kh, 


where 
2 pa Je aye. 72 42 Ch- 
Q Geyer Ut L°Mk*t?- 32). Che 42 h-2)) att 


If A is even, then making use of Rolle’s Theorem, 


HY Art F (ated, ) 
=G soe 
(2 z/ A+! 


where 


C, -553 2 ft (t21HtW bt? 22)... (ht ?- (-2) Jatt. 


The error over the whole interval will be obtained by summing 
the 77 errors corresponding to each 4 subintervals. 
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If 7 trapezoids are formed by joining the ends of successive 
ordinates then the area as approximated by the sum of the areas 
of these trapezoids is 


A va-l 
(2) A=3 2 a (Yo *Nyag J 
and the error is 

(2- a)? u 
2 Gia-lcee FO) 
-_ L2n@ 


i.e., an error of the order of ss ‘ 

Simpson’s formula may be obtained by passing second de- 
gree parabolas through the ends of three successive ordinates, 
that is A=<, and gives 


m7 m 
(3) Az Ble a Yoy ‘— ‘ont “Stem | n=2™. 


Vv=—O 


The error is 


(b-a)” év 
3e E’s- — *""f), 
om 180 n* " 
i.e., an error of the order of 4G 


To illustrate the fact that sometimes the rectangle formula 
(1) gives a better approximation than the Simpson formula (3) 
these formulas will be applied to the problem of finding the area 
under the so-called normal curve of error. From a table* giving 
five places of decimals it is seen that the ordinates to the right 


of x= 4/76 and to the left of x =-4/76 are everywhere zero 
2 


if the equation be written in the form ype e*  . Divide 
the interval from x=-460to «=460 into eight partial in- 
tervals each of length 1.20. Formula (1) gives 4=.99996 while 


4Jas. W. Glover, Tables of Applied Mathematics in Finance, Insurance 
and Statistics. 
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(3) gives A=. 97 G34, the same ordinates being used in each 
case. 

There are three objections to the nature of Simpson’s form- 
ula. They are the lack of smoothness at the points of intersection 
of the parabolas, the unequal weights attached to the odd and 
even numbered ordinates, and the requirement that the number 
of ordinates be odd. 

Catalan® notices the lack of smoothness at the intersections 
of the parabolas used in setting up Simpson’s rule and improves 
on it by passing parabolas through three successive ordinates and 
then retaining only the first half of each parabola except in the 
case of the last three ordinates where it is necessary to retain the 
whole parabola. To counterbalance the asymmetry introduced by 
these last three ordinates he repeats the process beginning with the 
last ordinat and then takes the arithmetic mean of the two results 
as his formula. 

This gives 


“ ] 
(4) 4- |Z 5 eM) BGS Sg Oe Unt? | 


And, of course, the error is still of the order of 4S . This 
formula has the additional advantage that it holds no matter 
whether 7 is even or odd. 

Similarly Crotti® showed that the different weights attached 
to the odd and even numbered ordinates in Simpson’s formula ic 
a disadvantage. And Parmentier’ by subtracting Simpson’s form- 
ula from twice Catalan’s obtained a formula in which the weights 
are the reverse of those in Simpson’s. Mansion® gave an alterna- 
tive derivation of Catalan’s formula, his derivation requiring, 
however, an even number of ordinates. 
~~ 6B, Catalan, Nouvelles Annales, 1"* series (1851), pp. 412-415. 


®Crotti, Il Politechnio 33 (1885), pp. 193-207. 
7Parmentier, Association francaise pour l’avancement des sciences, Ses- 


sion Grenoble, 1882. 
8Mansion, Supplement zu Mathesis 1 (1881). 
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Catalan’s formula may be thought of as the'rectangle formula 
plus three correctional terms involving the first three and the last 
three ordinates. In the case of an even number of ordinates a 
formula® involving only two such correctional terms and giving 
an approximation of the order of the error in the single trapezoid, 
i.e., of the order of 4 , the error in a single trapezoid of width 

“ J 
fons being GN e ae , can be obtained by applying 
Simpson’s formula to the first 277-/ ordinates and approximating 
the remaining area by the trepezoid rule. Repeat the process from 
the opposite end and take the arithmetic mean of the two results 
as the quadrature formula. This gives 


~ 7 
8) aA En ZB lg emlLlpermy) | meemred. 


It is the only formula with just two correctional terms which will 
give even this order of approximation in general because any 
change in the coefficients of these end ordinates will introduce in 
general an error of the order of the error in the rectangle formula 
for a single subinterval, i.e., an error of the order of 72. 

Another important quadrature formula is called the three- 
eighths rule and is obtained by passing third order parabolas 
through four successive ordinates. It may be written 


sh[_m m-L ml 
— me ez Sy 9 Eves? Evora Coton) pani 
The error is 
(6c) gra al, p My) 
400 n* + 


i.e., an error of the same order as the error corresponding to 
Simpson’s formula. The error terms derived from the Lagrange 


®Durand, Engineering News, Jan. 1894. J. Lipka, Graphical and Me- 
chanical Computation, Part II, p. 226. 
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formula shows the advantage of using parabolas of even degree. 

Besides the fact that the order of the error is the same as 
that in the case of Simpson’s formula, this three-eighths formula 
has disadvantages similar to those mentioned in the case of Simp- 
son’s formula. There is still a lack of smoothness at the intersec- 
tions of the parabolas; the weights attached to the ordinates are 
as undesirable as before; and the number of partial intervals must 
be a multiple of three. 

It is possible however to do away with these disadvantages 
by proceeding as follows. Pass a third order parabola through 
the first four ordinates 4% %? %- Retain only the area 
in the first two partial intervals. Pass a third order parabola 
through the four ordinates y , y,, ¥%» yg and retain only the 
area in the central interval. Proceed in this way retaining each 
time only the area in the central interval until the last four ordi- 
nates are reached where it will again be necessary to retain the 
area in two strips, viz., the last two partial intervals. The sum 
of these areas gives the required quadrature formula. It is 


(7) ane x $06 hy 4 +My. 7 ZG Yat) #8 * Yrs J 


This formula holds for any » greater than or equal to three. 
From the point of view of the order of the error this formula 
is, as one would expect, no better than Catalan’s formula. As a 
matter of fact formula (7) can be obtained from formula (4) by 


3 3 . he i 
subtracting from (4) FE y, - Ay ,/a quantity which, in 
general, is of the order of 4 ‘ 

If «477 and fourth order parabolas are used in approx- 


imating the area in four successive partial intervals then the 
formula is 


457 ml o ry AL 
(8) A=5 12. Yay LOX Yayss +02 Yavee OL yy 3% *Ya mn? . 
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The error is, 


Clb-a)” pvi 

(8e) &'-- eae, f'@), 
i.e., an error of the order of 4G . 

Several modifications may be made to improve this formula. 
For instance if 7=<777+/ then apply the fourth degree parabola 
to the ordinates y , %, %, %, % and retain only the area 
in the first three strips. Apply a fourth degree parabola to the 
ordinates % , %» %» %» SM and retain the area in the two 
central strips. And so on till in the final step it will be necessary 
to retain the area in the last three strips. Addition gives the 


formula 


(9) A=355 e968 oy tI4ME, yoy, , - OFN hg )t HAY Yop) 
250 (Ye + Yo g)+LOO(y *Nomyg) LH Neg * Vor 


A formula which holds for any 77 may be obtained by passing 
a fourth degree parabola through y%, %, Ys Vy» VM and re- 
taining only the area between y, and y, . Pass a fourth degree 
parabola through y, , Yo» Yes Ver and retain only the area 
between y, and y, . And so on, retaining only the area in one 
strip, until at the end it will be necessary to retain the area in the 
last three strips. Repeat the process beginning at the last ordinate 
and take the arithmetic mean. The result is 


77 
(10) AA | 2 x, 3536") SGI BOG %e? 
, ao alli 
*720 (ly Yn3?-76G Me “Ina. 


This formula can be obtained in the case of an even number of 
ordinates by retaining three strips at the beginning, two from 
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then on, reversing the process and taking the arithmetic mean. 
Formulas (4), (5), (7) and (10) not only give, in general, 
at least as good approximations as Simpson’s formula, the trape- 
zoid formula, the three-eighths formula, and the fourth degree 
formula (8) respectively, but in addition have the important prop- 
erty that under certain conditions they show that the simple rec- 
tangle formula must give at least as good an approximation as the 
higher formulas. If f/x) is a function such that the curve 
| y= #(x) actually, or at least for practical purposes, coincides 
with the x -axis to the left of x=q@ and to the right of x~=5 
then in dividing the interval from @ to 4 into 4 equal parts 
each of length 4 it will not affect the area required if two, one, 
three or four partial intervals of length 4 are marked off to the 
left of @ and to the right of 4, the number of such partial inter- 
vals corresponding to (4), (5), (7) and (10) respectively. Hence 
it is seen that under these conditions (4), (5), (7) and (10) 

reduce to the simple rectangle formula (1). 

If the curve coincides with the x -axis at one end of the 
interval over which the area is required but does not at the other 
end then formulas (4), (5), (7) and (10) become respectively 


= ” J 1 1 
(4a) A=h(Z y, "Bn * Ent "Fg Yn-2 do 


2m-1 7 1 
(5a) A=h(2 % LZ Yon 1 *t2 2 m-2 ), 


Ga) A> AZ re £% 415 Ine” B Sus +34 Yma) 


” 193 77 7 7 
(10a) AxA(Z. y, 223 ye, + Edy Ine ooo ns fog in-e) 
For example, consider again the normal curve of error and sup- 
pose that the area to the left of the ordinate at x-o is required. 
Formulas (4a), (5a), (7a) and (10a) apply and for sixteen par- 
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tial intervals give respectively 4=. 79994 Az.49550, A=. 3000G, 
and A=, 50002, an extra partial interval to the left of x--4&0 
being used in the case of (5a) in order to have an odd number 
of intervals for that formula. Using thirty-two partial intervals 
the same formulas give 4= 49999, A=.49949, A=. 5OOOO, and 
Az, OOOO respectively. 

If, as often happens, the values of ordinates outside the in- 
terval over which the area is required are known then even better 
quadrature formulas may be obtained. For example, suppose that 
in deriving formula (7) the ordinate y, at a distance of 4 to 
the left of y and the ordinate y,,, ata distance 4 to the right 
of y, are krown. Then it will not be necessary to retain the 
areas in double strips at the beginning and end of the interval, 
and the formula for the area over the interval from x=@ to 
x= is 


(11) Ae ANE x, “Z Oy* net ”3 Zyex JF 7 4 * Mer. . 


It should be noted that in case y, and y,,, are known Cata- 
lan’s formula reduces to (11). And, similarly, in the case of the 
derivation of formula (10) it will be necessary to retain the area 
in a single strip each time except in the case of the last application 
of the fourth degree parabola when it will be necessary to retain 
the area in the two central strips. The formula arrived at is 


(12) AA] 2 x, 5 Ger ned-f55 ot MoH Ys) 


62 5G i440 (%* ‘ns | ' 


Formulas (11) and (12) reduce to the rectangle formula (1) 
under the same conditions as in the cases of (4), (5), (7) and 
(10). Likewise when the curve coincides with the x -axis to the 
left of x=a@(11) and (12) become 





eg ep 
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7 1 f f 
(lla) A-A(E ¥,-2 Yous 5 mn * 3g ¥n-td » and 


i. Sa 5 oI 4 i 


If we apply formula (lla) to finding the area under the normal 
curve to the left of the ordinate at x=o and take 7-4, 4=120, 
a=-4G0 then we find A=. 49999. In other words, in this case 
(lla) gives as good a result with six ordinates as (4a) or (7a) 
give with thirty-three ordinates or (6a) with thirty-four ordinates. 

Quadrature formulas involving parabolas of degree higher 
than four have been obtained but they are to be used with caution 
on account of the great freedom they allow the approximating 
curves. However, modifications similar to those in this paper could 
also be made for these higher formulas. And the effect of any 
number of ordinates outside the ends of the interval could be 
noted. 

This note will be concluded with a remark on the effect of 
errors in the data giving the values of the ordinates. Suppose the 
quadrature formula is A= 4/(@,¥%+a,%+@,%+°°°-°°- + 2, Vo ) 
and suppose further that each y, is subject to an errore;, «QL, 
4,3,--,7. If e is the greatest of the absolute values of the e; 
then the error in A cannot be greater than he (Gg +, +,4-*++ ay) 
if Q,@,, Z,-+++°* ,@,, are all positive, as will be true if parabolas 
of the fourth degree or lower are used. But A@+a,+a,+-42,,/-(b-2) 
if the area is to be four from x-a@ to x=4. Hence the error in 
A due to errors in the data is not greater than e/ 5-2), When 
parabolas of degree higher than four are used the coefficients in 
the quadrature formula are not always positive. 


—————— 





